ggplot2(3)-条形图

数据集

1
2
3

library(ggplot2)
library(gcookbook)
head(pg_mean)

数据如下所示：

> head(pg_mean)
  group weight
1  ctrl  5.032
2  trt1  4.661
3  trt2  5.526

简单绘图

1	ggplot(pg_mean,aes(x=group, y=weight))+geom_bar(stat="identity")

代码解释：geom_bar()表示绘制的是条形图，geom是图形的意思，例如点、线、多边形等。

x轴变量类型

条形图的x轴变量是离散型变量，如果数据集中的是连续型变量，则需要用factor()转化为因子型变量，以下是未转化前的图形：

> BOD
  Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8
str(BOD)
## 'data.frame':    6 obs. of  2 variables:
##  $ Time  : num  1 2 3 4 5 7
##  $ demand: num  8.3 10.3 19 16 15.6 19.8
##  - attr(*, "reference")= chr "A1.4, p. 270"

绘图：

1
2

ggplot(BOD, aes(x=Time, y=demand)) + geom_bar(stat="identity")
# 绘图函数里的stat参数表示对样本点做统计的方式，默认为identity，表示一个x对应一个y，同时还可以是bin，表示一个x对应落到该x的样本数。”说白了就是，identity提取横坐标x对应的y值，bin提取横坐标x的频数

现在将Time转化为factor，再进行绘图，如下所示：

1	ggplot(BOD, aes(x=factor(Time), y=demand)) + geom_bar(stat="identity")

颜色填充

以下的代码中，将上述条形图的颜色进行调整，用浅蓝色进行填充，用黑色描边：

1	ggplot(pg_mean,aes(x=group,y=weight))+geom_bar(stat="identity",fill="lightblue",colour="black")

簇状条形图

此例子中用gcookbook包中的cabbage_exp数据集。

head(cabbage_exp) # 查看数据集
##   Cultivar Date Weight        sd  n         se
## 1      c39  d16   3.18 0.9566144 10 0.30250803
## 2      c39  d20   2.80 0.2788867 10 0.08819171
## 3      c39  d21   2.74 0.9834181 10 0.31098410
## 4      c52  d16   2.26 0.4452215 10 0.14079141
## 5      c52  d20   3.11 0.7908505 10 0.25008887
## 6      c52  d21   1.47 0.2110819 10 0.06674995

绘图:

1	ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(position="dodge",stat="identity")

mark

注意参数中的position=”dodge”,若无此参数，则Cultivar的两个变量会叠加，dodge意思是“避开”，即添加上此参数，两个变量避开，若无此参数，图像如下所示：

1	ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(stat="identity")

mark

颜色的设置

RColorBrewer包中有各种颜色，如下所示：

1 2	library(RColorBrewer) ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(position="dodge",stat="identity",colour="red")+scale_fill_brewer(palette="Pastell")

mark

频数条形图：

1 2	ggplot(diamonds,aes(x=cut))+geom_bar() # 等价于geom_bar(stat="bin")

mark

geom_bar()默认情况下参数为stat=”bin”,当x轴的变量是分类变量，即因子型向量时此函数会自动计算每组变量对应的观测数，如上图所示，如果x是连续型变量，则出现的是直方图，如下图所示：

1	ggplot(diamonds,aes(x=carat))+geom_bar()

mark

对条形图进行上色

upc <- subset(uspopchange,rank(Change)>40) # rank()是秩的顺序
upc
##             State Abb Region Change
## 3         Arizona  AZ   West   24.6
## 6        Colorado  CO   West   16.9
## 10        Florida  FL  South   17.6
## 11        Georgia  GA  South   18.3
## 13          Idaho  ID   West   21.1
## 29         Nevada  NV   West   35.1
## 34 North Carolina  NC  South   18.5
## 41 South Carolina  SC  South   15.3
## 44          Texas  TX  South   20.6
## 45           Utah  UT   West   23.8
ggplot(upc,aes(x=Abb,y=Change,fill=Region))+geom_bar(stat="identity")

mark

还可以使用颜色代码进行上色，如下所示：

1	ggplot(upc,aes(x=reorder(Abb,Change),y=Change,fill=Region))+geom_bar(stat="identity",colour="black")+scale_fill_manual(values=c("#669933","#FFCC66"))+xlab("State")

mark

正负条形图上色

1
2
3

csub <- subset(climate,Source=="Berkeley" & Year >= 1900)
csub$pos <- csub$Anomaly10y>=0 # 将正数转化为T，负数转化为F
ggplot(csub,aes(x=Year,y=Anomaly10y,fill=pos))+geom_bar(stat="identity",position="identity")

mark

调节上色顺序

1
2

ggplot(csub,aes(x=Year,y=Anomaly10y,fill=pos))+geom_bar(stat="identity",position="identity",colour="black",size=0.25)+scale_fill_manual(values=c("#CCEEFF","#FFDDDD"),guide=FALSE) 
# guide=FALSE消除图例

mark

调节条形图之间的间距

先看一下原始图：

1
2

mark

将间隔改为0.5

1	ggplot(pg_mean,aes(x=group,y=weight))+geom_bar(stat="identity",width=0.5)

mark

将间隔改为1（最大）

1	ggplot(pg_mean,aes(x=group,y=weight))+geom_bar(stat="identity",width=1)

mark

簇状条形图间隔的改变

原始图：

1	ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(stat="identity",width=0.5,position="dodge")

mark

更改：参数是position=position_dodge()

1	ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(stat="identity",width=0.5,position=position_dodge(0.7))

mark

堆积条形图

1	ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(stat="identity")+guides(fill=guide_legend(reverse=TRUE))

mark

注意图例中的变化，guides(fill=guide_legend(reverse=TRUE))

1	ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(stat="identity")+guides(fill=guide_legend(reverse=FALSE))

mark

调整堆叠的顺序

用于plyr包中desc()函数。

1 2	library(plyr) ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar,order=desc(Cultivar)))+geom_bar(stat="identity")

mark

堆叠图的美化

1 2	library(RColorBrewer) ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(stat="identity",colour="black")+guides(fill=guide_legend(reverse=TRUE))+scale_fill_brewer(palette="Pastell")

mark

绘制百分比条形图

1 2	ce <- ddply(cabbage_exp,"Date",transform,percent_weight=Weight/sum(Weight)*100) ggplot(ce,aes(x=Date,y=percent_weight,fill=Cultivar))+geom_bar(stat="identity")

mark

百分比条形图的美化

1	ggplot(ce,aes(x=Date,y=percent_weight,fill=Cultivar))+geom_bar(stat="identity",colour="black")+guides(fill=guide_legend(reverse=TRUE))+scale_fill_brewer(palette="Pastell")

mark

添加数据标签

标签在条形图顶端下方：

1	ggplot(cabbage_exp,aes(x=interaction(Date,Cultivar),y=Weight))+geom_bar(stat="identity")+geom_text(aes(label=Weight),vjust=1.5,colour="white")

mark

标签在条形图顶端上方：

1	ggplot(cabbage_exp,aes(x=interaction(Date,Cultivar),y=Weight))+geom_bar(stat="identity")+geom_text(aes(label=Weight),vjust=-0.2,colour="blue")

mark

调节y轴与标签：

调节y轴的上限

1	ggplot(cabbage_exp,aes(x=interaction(Date,Cultivar),y=Weight))+geom_bar(stat="identity")+geom_text(aes(label=Weight),vjust=-0.2)+ylim(0,max(cabbage_exp$Weight)*1.05)

mark

设定标签的y轴位置使其高于条形图顶端

1	ggplot(cabbage_exp,aes(x=interaction(Date,Cultivar),y=Weight))+geom_bar(stat="identity")+geom_text(aes(y=Weight+0.1,label=Weight))

mark

簇状条形图的标签设置

1
2
3

ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+
  geom_bar(stat="identity",position="dodge")+
  geom_text(aes(label=Weight),vjust=1.5,colour="white",position=position_dodge(0.9),size=5)

mark

堆积簇状条形图的标签设置-位于顶端

ce <- arrange(cabbage_exp,Date,Cultivar)
ce <- ddply(ce,"Date",transform,label_y=cumsum(Weight)) 
# cumsum累积加，例如cumsum(seq(1,10))
ggplot(ce,aes(x=Date,y=Weight,fill=Cultivar))+  
  geom_bar(stat="identity")+
  geom_text(aes(y=label_y,label=Weight),vjust=1.5,colour="white")

mark

堆积簇状条形图的标签设置-位于中央

ce <- arrange(cabbage_exp,Date,Cultivar)
ce <- ddply(ce,"Date",transform,label_y=cumsum(Weight)-0.5*Weight) 
ggplot(ce,aes(x=Date,y=Weight,fill=Cultivar))+  
  geom_bar(stat="identity")+
  geom_text(aes(y=label_y,label=Weight),vjust=1.5,colour="white")

mark

堆积簇状条形图的标签设置-添加单位

ce <- arrange(cabbage_exp,Date,Cultivar)
ce <- ddply(ce,"Date",transform,label_y=cumsum(Weight)-0.5*Weight) 
ggplot(ce,aes(x=Date,y=Weight,fill=Cultivar))+  
  geom_bar(stat="identity",colour="black")+
    geom_text(aes(y=label_y,label=paste(format(Weight,nsmall=2),"kg")),size=4)+
  guides(fill=guide_legend(reverse=TRUE))+
  scale_fill_brewer(palette="Blues")

mark

绘制Cleveland图

基础绘图

用到的是geom_point()，先看一个最基本的图形：

1 2	tophit <- tophitters2001[1:25,] ggplot(tophit,aes(x=avg,y=name))+geom_point()

mark

排序

上图是用字母顺序来排列的，下面的图是用avg的大小来排列的：

ggplot(tophit,aes(x=avg,y=reorder(name,avg)))+
  geom_point(size=3)+
  theme_bw()+
  theme(panel.grid.major.x=element_blank(),
       panel.grid.minor.x=element_blank(),
       panel.grid.major.y=element_line(colour="grey60",linetype="dashed"))

mark

x轴与y轴互换

ggplot(tophit,aes(x=reorder(name,avg),y=avg))+
  geom_point(size=3)+
  theme_bw()+
  theme(axis.text.x = element_text(angle=60,hjust=1),
        panel.grid.major.y=element_blank(),
        panel.grid.minor.y=element_blank(),
        panel.grid.major.x=element_line(colour="grey60",linetype="dashed"))

mark

火柴杆图

nameorder <- tophit$name[order(tophit$lg,tophit$avg)]
tophit$name <- factor(tophit$name,levels=nameorder)
> head(tophit)
         id   first   last           name year stint team lg   g  ab   r   h 2b 3b hr rbi sb cs
1 walkela01   Larry Walker   Larry Walker 2001     1  COL NL 142 497 107 174 35  3 38 123 14  5
2 suzukic01  Ichiro Suzuki  Ichiro Suzuki 2001     1  SEA AL 157 692 127 242 34  8  8  69 56 14
3 giambja01   Jason Giambi   Jason Giambi 2001     1  OAK AL 154 520 109 178 47  2 38 120  2  0
4 alomaro01 Roberto Alomar Roberto Alomar 2001     1  CLE AL 157 575 113 193 34 12 20 100 30  6
5 heltoto01    Todd Helton    Todd Helton 2001     1  COL NL 159 587 132 197 54  2 49 146  7  5
6  aloumo01  Moises   Alou    Moises Alou 2001     1  HOU NL 136 513  79 170 31  1 27 108  5  1
   bb  so ibb hbp sh sf gidp    avg
1  82 103   6  14  0  8    9 0.3501
2  30  53  10   8  4  4    3 0.3497
3 129  83  24  13  0  9   17 0.3423
4  80  71   5   4  9  9    9 0.3357
5  98 104  15   5  1  5   14 0.3356
6  57  57  14   3  0  8   18 0.3314
ggplot(tophit,aes(x=avg,y=name))+
  geom_segment(aes(yend=name),xend=0,colour="grey")+
  geom_point(size=3,aes(colour=lg))+
  scale_colour_brewer(palette="Set1",limits=c("NL","AL"))+
  theme_bw()+
  theme(panel.grid.major.y=element_blank(),
        legend.position=c(1,0.55),
        legend.justification=c(1,0.5))

mark

注：order()函数的意思是，把原向量的元素按从小到大排列，输出原来向量的所在位置，而sort()则是把原向量的元素从小到大排列，输出元素值。

a<-c(3,9,0,12,19)
sort(a) # sort(); 输出排序后的结果
## [1]  0  3  9 12 19
order(a) # 输出排序后的各个向量位置
## [1] 3 1 2 4 5
# 3表示原向量中第3个元素排在第1位，1的意思是原向量中第1个排在第2位，

以队为分组变量进行分面

ggplot(tophit,aes(x=avg,y=name))+
  geom_segment(aes(yend=name),xend=0,colour="grey50")+
  geom_point(size=3,aes(colour=lg))+
  scale_colour_brewer(palette="Set1",limits=c("NL","AL"),guide=FALSE)+
  theme_bw()+
  theme(panel.grid.major.y=element_blank())+
  facet_grid(lg~.,scales="free_y",space="free_y")

mark

参考资料

常肖楠, 邓一硕, 魏太云. R数据可视化手册[M]. 人民邮电出版社, 2014.